Compressed String Dictionary Look-Up with Edit Distance One
نویسندگان
چکیده
In this paper we present different solutions for the problem of indexing a dictionary of strings in compressed space. Given a pattern P , the index has to report all the strings in the dictionary having edit distance at most one with P . Our first solution is able to solve queries in (almost optimal) O(|P |+ occ) time where occ is the number of strings in the dictionary having edit distance at most one with P . The space complexity of this solution is bounded in terms of the k-th order entropy of the indexed dictionary. Our second solution further improves this space complexity at the cost of increasing the query time.
منابع مشابه
Dictionary Look-Up within Small Edit Distance
Let W be a dictionary consisting of n binary strings of length m each, represented as a trie. The usual d-query asks if there exists a string in W within Hamming distance d of a given binary query string q. We present an algorithm to determine if there is a member in W within edit distance d of a given query string q of length m. The method takes time O(dm d+1) in the RAM model, independent of ...
متن کاملEfficient approximate dictionary look-up over small alphabets
Given a dictionary W consisting of n binary strings of length m each, a d-query asks if there exists a string in W within Hamming distance d of a given binary query string q. The problem was posed by Minsky and Papert in 1969 [10] as a challenge to data structure design. Efficient solutions have been developed only for the special case when d = 1 (the 1-query problem). We assume the standard RA...
متن کاملApproximate string matching algorithms for limited-vocabulary OCR output correction
Five methods for matching words mistranslated by optical character recognition to their most likely match in a reference dictionary were tested on data from the archives of the National Library of Medicine. The methods, including an adaptation of the cross correlation algorithm, the generic edit distance algorithm, the edit distance algorithm with a probabilistic substitution matrix, Bayesian a...
متن کاملA fast algorithm for finding the nearest neighbor of a word in a dictionary
In this paper a new algorithm for string edit distance computation is proposed. It is based on the classical approach [11]. However, while in [11] the two strings to be compared may be given online, our algorithm assumes that one of the two strings to be compared is a dictionary entry that is known a priori. This dictionary word is converted, in an o -line phase to be carried out beforehand, in...
متن کاملAlgorithme de recherche approximative dans un dictionnaire fondé sur une distance d'édition définie par blocs
We propose an algorithm for approximative dictionary lookup, where altered strings are matched against reference forms. The algorithm makes use of a divergence function between strings— broadly belonging to the family of edit distances; it finds dictionary entries whose distance to the search string is below a certain threshold. The divergence function is not the classical edit distance (DL dis...
متن کامل